Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup of mkFit hit converters, and introduction of mkFit HLT customization for 2025 #47106

Merged
merged 3 commits into from
Jan 17, 2025

Conversation

mmasciov
Copy link
Contributor

@mmasciov mmasciov commented Jan 15, 2025

PR description:

This PR aims at speeding up the mkFit (input) converters and the mkFit producer, targeting HLT in 2025:
- RecoTracker/MkFit/plugins/MkFit*HitConverter.cc and convertHits.h: save mkFit layer index per hit to be used for a faster MkFitEventOfHitsProducer; use more variables available for a given Det to avoid their recomputation per hit (including hit local->global conversion); access clusters from the reference collection by index directly instead of dereferencing an OmniClusterRef
- RecoTracker/MkFit/plugins/MkFitEventOfHitsProducer.cc uses pre-computed per-hit layer index
- RecoTracker/MkFit/plugins/MkFitProducer.cc (and customization functions) : disable (by config) additional/redundant re-check of the input cluster charge (already applied during the hit creation); this should work OK for an mkFit setup where CCC is the same in all iterations
- RecoTracker/MkFitCore/interface/Hit.h : vdt, replace (slow) std::hypot with explicit sqrt(x*x+y*y); use SMatrixSym33 for direct covariance storage to avoid conversion from SVector6
- RecoTracker/MkFitCore/src/MkFinder.cc, RecoTracker/MkFitCore/src/MkFitter.cc, RecoTracker/MkFitCore/src/PropagationMPlex*.cc, RecoTracker/MkFitCore/src/Track.cc vdt, replace (slow) std::hypot with explicit sqrt(x*x+y*y)

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules:

  • MkFitEventOfHitsProducer : -40%
  • MkFitSiStripHitConverter: -35%
  • MkFitSiPixelHitConverter: -40%
  • MkFitProducer: -14% (7% is the faster math, 7% from not updating the cluster mask with CCC)

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

PR validation:

This PR was validated using both offline and HLT configurations.
For offline: http://uaf-10.t2.ucsd.edu/~mmasciov/MIC/HLTTracking/offlineMTV_TTbarPU_2024_PR155/
--> Only differences are at the level of fluctuations, consistently with the proposed changes.

FYI: @cms-sw/tracking-pog-l2, @slava77, @mtosi, @missirol

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 15, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mmasciov for master.

It involves the following packages:

  • RecoTracker/IterativeTracking (reconstruction)
  • RecoTracker/MkFit (reconstruction)
  • RecoTracker/MkFitCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmasciov
Copy link
Contributor Author

please test

@fwyzard
Copy link
Contributor

fwyzard commented Jan 15, 2025

replace (slow) std::hypot with explicit sqrt(x*x+y*y)

do you know why std::hypot is slower ?
is it because it uses double precision while the explicit replacement uses single precision ?

@fwyzard
Copy link
Contributor

fwyzard commented Jan 15, 2025

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules

I'm curious, what is the timing reduction measured using the framework report or the FastTimerService ?

@mmusich
Copy link
Contributor

mmusich commented Jan 15, 2025

do you know why std::hypot is slower ?

https://stackoverflow.com/questions/32435796/when-to-use-stdhypotx-y-over-stdsqrtxx-yy

Looks like builtin overflow / underflow checks slow it down?

@mmasciov
Copy link
Contributor Author

Timing reduction (based on callgrind) in MC ttbar with PU using mkFit at HLT configuration is overall ~30% in mkFit-related modules

I'm curious, what is the timing reduction measured using the framework report or the FastTimerService ?

When running the same HLT configuration on the same machine, I had checked the FastTimerService output for MkFitEventOfHitsProducer, MkFitSiPixelHitConverter and MkFitSiStripHitConverter, and indeed the corresponding timing was reduced by roughly 35% for all of them, consistent with the values reported above.

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 112KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43767/summary.html
COMMIT: 82b554f
CMSSW: CMSSW_15_0_X_2025-01-14-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47106/43767/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 25 lines to the logs
  • Reco comparison results: 26037 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3932183
  • DQMHistoTests: Total failures: 16750
  • DQMHistoTests: Total nulls: 17
  • DQMHistoTests: Total successes: 3915396
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mmusich
Copy link
Contributor

mmusich commented Jan 15, 2025

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

What's the final customization function to be applied on top of the HLT menu for testing purposes?
Is it RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 ?

@mmasciov
Copy link
Contributor Author

For testing of the HLT configuration, PR cms-data/RecoTracker-MkFit#15 is required.

What's the final customization function to be applied on top of the HLT menu for testing purposes? Is it RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 ?

Correct. As mentioned at today's TSG meeting, I'm going to push a new commit, renaming RecoTracker/MkFit/customizeHLTIter0ToMkFit.customizeHLTIter0ToMkFitFor2025 so that it's picked up by the existing .7 workflows.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47106 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@mmasciov
Copy link
Contributor Author

please test

@slava77
Copy link
Contributor

slava77 commented Jan 16, 2025

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates
PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

@makortel
Copy link
Contributor

Just for my understanding: although the metric is not the same (FastTimer vs callgrind), the differences in MkFitSiPixelHitConverter, MkFitSiStripHitConverter and MkFitEventOfHitsProducer do not seem to be so pronounced in the Offline profiling test https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43777/profiling/13034.21/diff-step3_cpu.resources.json.html despite detachedTripletStepTrackCandidatesMkFit and pixelLessStepTrackCandidatesMkFit mainly are giving an overall reduction of 40% for MkFitProducer, do we understand why?

Isn't the printout buggy in the cpu time fraction diff percent column? it seems to be off by 100.

E.g. the topmost line CkfTrackCandidateMaker tobTecStepTrackCandidates PR is 3.864670 % (100*moduleTime/jobTime), baseline is 3.658508 % and 0.206161 is just 3.864670 - 3.658508, which is just 0.2% (essentially no change), but the cpu time fraction diff percent column shows 20.616146% and is highlighted in red

Could you copy that comment to #43166 ?

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9da047/43793/summary.html
COMMIT: d578a59
CMSSW: CMSSW_15_0_X_2025-01-16-1100/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47106/43793/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 43 lines to the logs
  • Reco comparison results: 26035 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3932183
  • DQMHistoTests: Total failures: 16778
  • DQMHistoTests: Total nulls: 17
  • DQMHistoTests: Total successes: 3915368
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.10200000000000001 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 140.045,... ): -0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 141.042 ): 0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.301 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.408 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.5 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 145.604 ): -0.098 KiB JetMET/SUSYDQM
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mmusich
Copy link
Contributor

mmusich commented Jan 17, 2025

FWIW, I run HLT integration tests using this branch adding the following patch on top:

diff --git a/HLTrigger/Configuration/python/customizeHLTforCMSSW.py b/HLTrigger/Configuration/python/customizeHLTforCMSSW.py
index 3f800fab933..c7f59ec0e70 100644
--- a/HLTrigger/Configuration/python/customizeHLTforCMSSW.py
+++ b/HLTrigger/Configuration/python/customizeHLTforCMSSW.py
@@ -181,5 +181,8 @@ def customizeHLTforCMSSW(process, menuType="GRun"):
     process = customizeHLTfor47079(process)
     process = customizeHLTfor47047(process)
 
+    from RecoTracker.MkFit.customizeHLTIter0ToMkFit import customizeHLTIter0ToMkFit
+    process = customizeHLTIter0ToMkFit(process)
+
     return process
 

by running

cd HLTrigger/Configuration/test/
setenv CMS_PATH "/cvmfs/cms-ib.cern.ch"
setenv SITECONFIG_PATH "/cvmfs/cms-ib.cern.ch/SITECONF/local"
./runAll.csh GRun

The full log is available here, while the results are here.
I didn't observe any issues.

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @mandrenguyen, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-data/RecoTracker-MkFit#15

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 403d6b0 into cms-sw:master Jan 17, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants